14 research outputs found

    Measuring the functional sequence complexity of proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Abel and Trevors have delineated three aspects of sequence complexity, Random Sequence Complexity (RSC), Ordered Sequence Complexity (OSC) and Functional Sequence Complexity (FSC) observed in biosequences such as proteins. In this paper, we provide a method to measure functional sequence complexity.</p> <p>Methods and Results</p> <p>We have extended Shannon uncertainty by incorporating the data variable with a functionality variable. The resulting measured unit, which we call Functional bit (Fit), is calculated from the sequence data jointly with the defined functionality variable. To demonstrate the relevance to functional bioinformatics, a method to measure functional sequence complexity was developed and applied to 35 protein families. Considerations were made in determining how the measure can be used to correlate functionality when relating to the whole molecule and sub-molecule. In the experiment, we show that when the proposed measure is applied to the aligned protein sequences of ubiquitin, 6 of the 7 highest value sites correlate with the binding domain.</p> <p>Conclusion</p> <p>For future extensions, measures of functional bioinformatics may provide a means to evaluate potential evolving pathways from effects such as mutations, as well as analyzing the internal structural and functional relationships within the 3-D structure of proteins.</p

    Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints

    Get PDF
    BACKGROUND: We are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm. RESULTS: We use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment. CONCLUSION: Pairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm – this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN – have comparable overall performance with different strengths and weaknesses

    Computing Highly Correlated Positions Using Mutual Information and Graph Theory for G Protein-Coupled Receptors

    Get PDF
    G protein-coupled receptors (GPCRs) are a superfamily of seven transmembrane-spanning proteins involved in a wide array of physiological functions and are the most common targets of pharmaceuticals. This study aims to identify a cohort or clique of positions that share high mutual information. Using a multiple sequence alignment of the transmembrane (TM) domains, we calculated the mutual information between all inter-TM pairs of aligned positions and ranked the pairs by mutual information. A mutual information graph was constructed with vertices that corresponded to TM positions and edges between vertices were drawn if the mutual information exceeded a threshold of statistical significance. Positions with high degree (i.e. had significant mutual information with a large number of other positions) were found to line a well defined inter-TM ligand binding cavity for class A as well as class C GPCRs. Although the natural ligands of class C receptors bind to their extracellular N-terminal domains, the possibility of modulating their activity through ligands that bind to their helical bundle has been reported. Such positions were not found for class B GPCRs, in agreement with the observation that there are not known ligands that bind within their TM helical bundle. All identified key positions formed a clique within the MI graph of interest. For a subset of class A receptors we also considered the alignment of a portion of the second extracellular loop, and found that the two positions adjacent to the conserved Cys that bridges the loop with the TM3 qualified as key positions. Our algorithm may be useful for localizing topologically conserved regions in other protein families

    Vinylsulfonates in the carbene cyclization cycloaddition cascade reaction

    No full text
    Rising Stars of Research (RSR) 2009 - National Undergraduate Science and Engineering Research Poster Competition, Vancouver, B.C., 19-22 August 2009

    An Evolutionary Clustering Algorithm for Gene Expression Microarray Data Analysis

    No full text
    Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms have been used for this purpose. Based on experiments using simulated and real data, we also show that the performance of these algorithms can be further improved. For more effective clustering of gene expression microarray data, which is typically characterized by a lot of noise, we propose a novel evolutionary algorithm called evolutionary clustering (EvoCluster). EvoCluster encodes an entire cluster grouping in a chromosome so that each gene in the chromosome encodes one cluster. Based on such encoding scheme, it makes use of a set of reproduction operators to facilitate the exchange of grouping information between chromosomes. The fitness function that the EvoCluster adopts is able to differentiate between how relevant a feature value is in determining a particular cluster grouping. As such, instead of just local pairwise distances, it also takes into consideration how clusters are arranged globally. Unlike many popular clustering algorithms, EvoCluster does not require the number of clusters to be decided in advance. Also, patterns hidden in each cluster can be explicitly revealed and presented for easy interpretation even by casual users. For performance evaluation, we have tested EvoCluster using both simulated and real data. Experimental results show that it can be very effective and robust even in the presence of noise and missing values. Also, when correlating the gene expression microarray data with DNA sequences, we were able to uncover significant biological binding sites (both previously known and unknown) in each cluster discovered by EvoCluster.Department of Computin

    The rhodium-catalyzed carbene cyclization cycloaddition cascade reaction of vinylsulfonates

    No full text
    Vinylsulfonates have proved to be excellent dipolarophiles for carbonyl ylides derived from diazoketones in rhodium-catalyzed intramolecular cycloadditions. Polyfunctional substrates, such as 8 and (+)-15, were readily available from hydroxy esters, e.g. 1 and the cyclopenta-1,3-dione 10, respectively, and the resulting polycyclic sultones were formed under mild reaction conditions in high yields with very good diastereoselectivities. A ruthenium-catalyzed asymmetric transfer hydrogenation was found to desymmetrize the meso-cyclopenta-1,3-dione 12 efficiently. Β© 2009 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.link_to_subscribed_fulltex

    Report of Doctors' Fee Survey 1994

    No full text

    Information theory and the ethylene genetic network

    No full text
    The original aim of the Information Theory (IT) was to solve a purely technical problem: to increase the performance of communication systems, which are constantly affected by interferences that diminish the quality of the transmitted information. That is, the theory deals only with the problem of transmitting with the maximal precision the symbols constituting a message. In Shannon's theory messages are characterized only by their probabilities, regardless of their value or meaning. As for its present day status, it is generally acknowledged that Information Theory has solid mathematical foundations and has fruitful strong links with Physics in both theoretical and experimental areas. However, many applications of Information Theory to Biology are limited to using it as a technical tool to analyze biopolymers, such as DNA, RNA or protein sequences. The main point of discussion about the applicability of IT to explain the information flow in biological systems is that in a classic communication channel, the symbols that conform the coded message are transmitted one by one in an independent form through a noisy communication channel, and noise can alter each of the symbols, distorting the message; in contrast, in a genetic communication channel the coded messages are not transmitted in the form of symbols but signaling cascades transmit them. Consequently, the information flow from the emitter to the effector is due to a series of coupled physicochemical processes that must ensure the accurate transmission of the message. In this review we discussed a novel proposal to overcome this difficulty, which consists of the modeling of gene expression with a stochastic approach that allows Shannon entropy (H) to be directly used to measure the amount of uncertainty that the genetic machinery has in relation to the correct decoding of a message transmitted into the nucleus by a signaling pathway. From the value of H we can define a function I that measures the amount of information content in the input message that the cell's genetic machinery is processing during a given time interval. Furthermore, combining Information Theory with the frequency response analysis of dynamical systems we can examine the cell's genetic response to input signals with varying frequencies, amplitude and form, in order to determine if the cell can distinguish between different regimes of information flow from the environment. In the particular case of the ethylene signaling pathway, the amount of information managed by the root cell of Arabidopsis can be correlated with the frequency of the input signal. The ethylene signaling pathway cuts off very low and very high frequencies, allowing a window of frequency response in which the nucleus reads the incoming message as a varying input. Outside of this window the nucleus reads the input message as an approximately non-varying one. This frequency response analysis is also useful to estimate the rate of information transfer during the transport of each new ERF1 molecule into the nucleus. Additionally, application of Information Theory to analysis of the flow of information in the ethylene signaling pathway provides a deeper insight in the form in which the transition between auxin and ethylene hormonal activity occurs during a circadian cycle. An ambitious goal for the future would be to use Information Theory as a theoretical foundation for a suitable model of the information flow that runs at each level and through all levels of biological organization
    corecore